On Decision Boundaries of Naïve Bayes in Continuous Domains

نویسندگان

  • Tapio Elomaa
  • Juho Rousu
چکیده

in Continuous Domains Tapio Elomaa and Juho Rousu Department of Computer S ien e, University of Helsinki, Finland {elomaa,rousu} s.helsinki.fi Abstra t. Naïve Bayesian lassi ers assume the onditional independen e of attribute values given the lass. Despite this in pra ti e often violated assumption, these simple lassi ers have been found e ient, e e tive, and robust to noise. Dis retization of ontinuous attributes in naïve Bayesian lassi ers has a hieved a lot of attention re ently. Continuous attributes need not ne essarily be dis retized, but it uni es their handling with nominal attributes and an lead to improved lassi er performan e. We show that optimal partitioning results from de ision tree learning arry over to Naïve Bayes as well. In parti ular, it sets de ision boundaries on borders of segments with equal lass frequen y distribution. An optimal univariate dis retization with respe t to the Naïve Bayes rule an be found in linear time but, unfortunately, optimal multivariate optimization is intra table. 1 Introdu tion The naïve Bayesian lassi er, or Naïve Bayes, is surprisingly e e tive in lassi ation tasks. Therefore, even if it does not belong to state-of-the-art methods, it plays an important role alongside de ision tree learning as standard baseline methods of indu tive algorithms. Naïve Bayesian lassi ers have been studied extensively over the years [18, 19, 7℄. Springer 2003; Le ture Notes in Arti ial Intelligen e 2838, pp. 144 155. N. Lavra£ et al. (eds.): Knowledge Dis overy in Databases: PKDD 2003 Pro . 7th European Conferen e, PKDD'03 (Cavtat-Dubrovnik, Croatia).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment

In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...

متن کامل

Comparison of Decision Tree and Naïve Bayes Methods in Classification of Researcher’s Cognitive Styles in Academic Environment

In today world of internet, it is important to feedback the users based on what they demand. Moreover, one of the important tasks in data mining is classification. Today, there are several classification techniques in order to solve the classification problems like Genetic Algorithm, Decision Tree, Bayesian and others. In this article, it is attempted to classify researchers to “Expert” and “No...

متن کامل

Improvement of Decision Accuracy Using Discretization of Continuous Attributes

The naïve Bayes classifier has been widely applied to decisionmaking or classification. Because the naïve Bayes classifier prefers to dealing with discrete values, an novel discretization approach is proposed to improve naïve Bayes classifier and enhance decision accuracy in this paper. Based on the statistical information of the naïve Bayes classifier, a distributional index is defined in the ...

متن کامل

Using Bayesian networks for bankruptcy prediction: Some methodological issues

This study provides operational guidance for using naïve Bayes Bayesian network (BN) models in bankruptcy prediction. First, we suggest a heuristic method that guides the selection of bankruptcy predictors from a pool of potential variables. The method is based upon the assumption that the joint distribution of the variables is multivariate normal. Variables are selected based upon correlations...

متن کامل

Groundwater Potential Mapping using Index of Entropy and Naïve Bayes Models at Ardabil Plain

Although groundwater resources have long been selected as a safe choice for resolving human water requirements, overexploitation of them, especially at Ardabil plain, has promoted a decrease in the quality and quantity of these resources. One of the significant solutions is to identification of the groundwater potential zones and exploitation of them according to their potentials. The aim of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003